Goto

Collaborating Authors

 mathematical definition


Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions

Zhang, Lan, Valentino, Marco, Freitas, Andre

arXiv.org Artificial Intelligence

Thanks to their linguistic capabilities, LLMs offer an opportunity to bridge the gap between informal mathematics and formal languages through autoformalization. However, it is still unclear how well LLMs generalize to sophisticated and naturally occurring mathematical statements. To address this gap, we investigate the task of autoformalizing real-world mathematical definitions -- a critical component of mathematical discourse. Specifically, we introduce two novel resources for autoformalisation, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv). We then systematically evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL. Furthermore, we investigate strategies to enhance LLMs' performance including refinement through external feedback from Proof Assistants, and formal definition grounding, where we guide LLMs through relevant contextual elements from formal mathematical libraries. Our findings reveal that definitions present a greater challenge compared to existing benchmarks, such as miniF2F. In particular, we found that LLMs still struggle with self-correction, and aligning with relevant mathematical libraries. At the same time, structured refinement methods and definition grounding strategies yield notable improvements of up to 16% on self-correction capabilities and 43% on the reduction of undefined errors, highlighting promising directions for enhancing LLM-based autoformalization in real-world scenarios.


Reviews: The streaming rollout of deep networks - towards fully model-parallel execution

Neural Information Processing Systems

A main motivation is to increase the efficiency (e.g., response time) of the network during training/inference. A rollout is a graph that captures the functional dependency of network nodes over time. The authors argue that there are different possible rollouts that have different quality (e.g., response time), introduce mathematical definitions to describe rollouts (e.g., validity / model-parallelizable) and analyze rollouts theoretically and experimentally. In my understanding, the actual conclusion of the paper seems to be: the streaming ("R \equiv 1") is the best, e.g., Theorem in L192 states that the streaming rollout achieves the lowest response time over the entire graph. The experiments seem to support that conclusion. Note that how to obtain the streaming rollout is not clearly stated by the authors, although the Thm in L192 seems to suggest a working rule for obtaining it. Pro: Originality/Significance: - I'm not aware of earlier work that analyzes this low-level implementation issue, but it is worthwhile to analyze this for optimization purposes.


All models are wrong, but some are harmful

#artificialintelligence

A key challenge in building AI systems is creating a mathematical definition of the subject of interest. This causes a problem because no mathematical definition can capture the full scope of the natural world. This is further complicated by the fact that these definitions are created by teams of AI developers and will inevitably reflect the attitudes of these developers. This issue affects all AI systems. But that doesn't mean we shouldn't build any models for anything. A model can bring a benefit, even if it isn't perfect.


An overview of Principal Component Analysis

#artificialintelligence

This article will explain you what Principal Component Analysis (PCA) is, why we need it and how we use it. I will try to make it as simple as possible while avoiding hard examples or words which can cause a headache. A moment of honesty: to fully understand this article, a basic understanding of some linear algebra and statistics is essential. Let's say we have 10 variables in our dataset and let's assume that 3 variables capture 90% of the dataset, and 7 variables capture 10% of the dataset. Let's say we want to visualize 10 variables.


Exploring Inherent Properties of the Monophonic Melody of Songs

Wang, Zehao, Zhang, Shicheng, Chen, Xiaoou

arXiv.org Artificial Intelligence

Melody is one of the most important components in music. Unlike other components in music theory, such as harmony and counterpoint, computable features for melody is urgently in need. These features are highly demanded as data-driven methods dominating the fields such as musical information retrieval and automatic music composition. To boost the performance of deep-learning-related musical tasks, we propose a set of interpretable features on monophonic melody for computational purposes. These features are defined not only in mathematical form, but also with some considerations on composers 'intuition. For example, the Melodic Center of Gravity can reflect the sentence-wise contour of the melody, the local / global melody dynamics quantifies the dynamics of a melody that couples pitch and time in a sentence. We found that these features are considered by people universally in many genres of songs, even for atonal composition practices. Hopefully, these melodic features can provide nov el inspiration for future researchers as a tool in the field of MIR and automatic composition.


mAP (mean Average Precision) for Object Detection

#artificialintelligence

AP (Average precision) is a popular metric in measuring the accuracy of object detectors like Faster R-CNN, SSD, etc. Average precision computes the average precision value for recall value over 0 to 1. It sounds complicated but actually pretty simple as we illustrate it with an example. But before that, we will do a quick recap on precision, recall, and IoU first. Precision measures how accurate is your predictions. Recall measures how good you find all the positives.